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Empirical networks of weighted dyadic relations often contain “noisy” edges that alter the global 
characteristics of the network and obfuscate the most important structures therein. Graph pruning 
is the process of identifying the most significant edges according to a generative null model, and 
extracting the subgraph consisting of those edges. Here, we focus on integer-weighted graphs com¬ 
monly arising when weights count the occurrences of an “event” relating the nodes. We introduce a 
simple and intuitive null model related to the configuration model of network generation, and derive 
two significance filters from it: the Marginal Likelihood Filter (MLF) and the Global Likelihood 
Filter (GLF). The former is a fast algorithm assigning a significance score to each edge based on the 
marginal distribution of edge weights whereas the latter is an ensemble approach which takes into 
account the correlations among edges. We apply these filters to the network of air traffic volume be¬ 
tween US airports and recover a geographically faithful representation of the graph. Furthermore, 
compared with thresholding based on edge weight, we show that our filters extract a larger and 
significantly sparser giant component. 
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I. INTRODUCTION 

Graphs or networks are widely used as representations 
of the structure and dynamics of complex systems P- 
Too often in practice, networks of observed dyadic 
relationships are too dense to be of immediate use: the 
topology of the network is dominated by an abundance of 
“noisy” edges that must somehow be removed before the 
most significant structures are revealed. This process— 
which we refer to as pruning —is particularly useful in 
visualizing the so-called “hairball” networks, and can 
conceivably enhance the efficacy of community detection 
methods by serving as a preconditioner. 

We distinguish between the problem of pruning dis¬ 
cussed here on the one hand, and the problem of sparsi- 
fication on the other. Sparsification is the problem of ap¬ 
proximating a network using a subgraph with fewer edges 
such that some property of the graph is preserved within 
a desired tolerance. The goal of sparsification is typically 
to compute network characteristics of the original graph, 
only at a lower computational cost. Therefore, one must 
aim to minimally alter the character of the network in the 
process. For instance, when faced with a dense similar¬ 
ity matrix derived from a large number of data points, it 
is desirable to work instead with a sparse subgraph with 
the same community structure as the full graph. For such 
applications, one may use sparsifiers using random span¬ 
ning trees EH8I, or others that explicitly approximate the 
spectral properties of the graph Laplacian [9] . 

The problem of pruning on the other hand, involves the 
removal of a possibly large number of spurious edges that 
are believed to obfuscate an unknown core that contains 
the most important structures. It is therefore implied 
that the coveted core is different from the observed, noisy 
graph. The properties of the core such as its community 


structure are not known a priori, and thus, it is not clear 
which graph properties if any should be preserved in the 
process. In fact, the goal should arguably be to alter 
important features of the graph until the properties of 
the hidden core are revealed. 

Graph pruning is most commonly done by threshold¬ 
ing based on edge weights. This approach equates sig¬ 
nificance with edge weight, and fails to take into account 
the relationship between the edge, its incident vertices 
and their other edges. Therefore, thresholding based on 
weight systematically discounts low-degree vertices and 
structures they represent. In order to address this is¬ 
sue, alternative methods have been proposed such as the 
filters of nni and m- These methods consist of as¬ 
signing a p-value to each edge based on a null model of 
edge weight distribution, and subsequently filtering out 
all but those edges least likely to have occurred due to 
pure chance, namely those with the smallest p- values. 
The disparity filter of m accomplishes this by evalu¬ 
ating all edges incident on a given vertex in relation to 
one another. The GloSS filter of m is a computation¬ 
ally involved method attempting to preserve the weight 
distribution of edges. Here we propose two new mea¬ 
sures of significance based on a different null model. The 
first, which we dub “Marginal Likelihood Filter” (MLF) 
is a local significance measure computed independently 
for each edge from the marginal probability distribution 
of each edge weight, reducing pruning to a sorting prob¬ 
lem. We judge the significance of an edge in relation to 
the properties of both of its end vertices. According to 
our null model, the higher the degrees of two arbitrary 
vertices, the more likely they are to be connected to one 
another by chance. Therefore, the higher the degrees of 
an edge’s incident vertices, the larger its weight must be 
for it to be considered significant. The second, called 
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the “Global Likelihood Filter” (GLF) is a global mea¬ 
sure computed for each possible subgraph of a given size 
“as a whole”, thus taking into account the correlation 
among edges. Pruning then consists of finding the sub¬ 
graph with the highest significance. While arguably the 
more principled approach due to its global nature, GLF 
requires the use of Monte Carlo methods and can thus 
become prohibitively expensive for large graphs. MLF, 
on the other hand, provides a fast (almost linear in the 
number of edges) method easily scalable to very large 
graphs. 

In the following sections we will define the null model 
and derive from it the marginal likelihood edge filter for 
undirected as well as directed weighted networks. Then 
we show how, from the same null model, an ensemble 
approach to pruning can be developed and we describe 
the resulting filter (GLF). We apply the methodology to 
the network of air traffic volume between US airports in 
2012, and demonstrate how the filtered subgraphs differ 
in important topological measures from those obtained 
from simple weight thresholding at a comparable level. 

II. MARGINAL LIKELIHOOD FILTER 

The null model defines a “random” ensemble of graphs 
resembling the realized graph. We must therefore select 
some attributes of our graph and demand that the ran¬ 
dom ensemble possess those attributes. We propose a 
null model that preserves the total weight of the realized 
graph and its degree sequence on average. Here, by the 
degree of a vertex we mean the sum of the weights of all 
its incident edges—also known as the node’s strength — 
and we assume all weights to be positive integers. Fur¬ 
ther, we conceive of a weighted edge as multiple edges of 
unit weight. 

For a weighted undirected graph then, our null model 
assumes that the unit edges of the graph are assigned to 
a pair of vertices, one at a time, and independently of one 
another. For each edge, the two end points are chosen 
independently at random with probabilities proportional 
to the degrees. That is, a vertex with a higher realized 
degree is proportionally more likely to be assigned to an 
edge than a vertex of lower degree. This leads to the 
same pair-wise connection probability predicted by the 
configuration model [5i in the sparse or classical limit 
jl2j . Intuitively, vertices i,j , ••• in this model behave 
like chemical reactants in a solution with concentrations 
fcj, kj ,..., whose pairwise reaction rates are proportional 
to both reactant concentrations. Given this null model, 
for any arbitrary pair of vertices i and j with degrees ki 
and kj , we can compute the probability mass function of 
the weight of the edge connecting them. 

Suppose the graph possesses a total of T edges (recall 
that we count a weighted edge as multiple edges of unit 
weight). Throughout, we will assume that T 1. Each 


unit edge must choose two incident vertices at random, 
with probabilities proportional to vertex degrees. The 
probability that m out of the T edges will choose nodes 
i and j as their end points is given by the binomial dis¬ 
tribution B(T,p). In short, the null model is defined by 
the following distribution for the weight atj of the ( i,j ) 
pair: 

Pr [<Jij = m\ki,kj,T] = (^jp m {\ - p) T ~ m (1) 
where p = T = \ ^ k t (2) 

i 

One can verify that the expected value of the degree of 
node i is kikj/(2T ) = ki. Thus, the ensemble defined 
by the null model preserves the degree sequence on av¬ 
erage. We note that depending on the value of pT, for 
large T this distribution can tend to Poisson or normal 
distribution. With this distribution at hand, we can pro¬ 
ceed to compute a p-value for the realized value of the 
edge weight connecting i,j. Denote the realized weight of 
the (i,j) edge by Then, we can define the p-value as 

Sij ( Wij) = '^2 Pr Wij = m\ki,kj,T] . (3) 

rri>Wij 

This definition corresponds to a so-called one-tailed test 
where higher weights are considered more extreme re¬ 
gardless of the expected value of the null distribution. 
Once we have computed the p-value for all edges, we can 
proceed to filter out any edge with p-value Sij{w.ij) < a 
for any threshold a of our choosing. This will retain the 
edges least likely to have occurred purely by “chance” ac¬ 
cording to the marginal distributions resulting from the 
null model. Numerical evaluation of the p -value from the 
binomial probability distribution will pose challenges due 
to the large factorials involved. For large T, one can use 
asymptotic approximations of the binomial distribution 
instead (Poisson for pT = 0(1) and normal for pT^> 1). 
Some standard statistics packages include implementa¬ 
tions of the so-called binomial test which computes pre¬ 
cisely the p-value in question. We use the implementa¬ 
tion in Python’s statsmodels package. 

We can generalize this formalism to the case of 
weighted directed graphs. Here, the graph is character¬ 
ized by two degree sequences: the in-degree sequence, 
and the out-degree sequence. For a directed edge be¬ 
tween vertices i,j , the realized state consists of 

aij weight of the directed edge (i,j) (4) 

k° ut out-degree of node i (5) 

kj 1 in-degree of node j (6) 

Again, we assume as the null model, that each of the T 
directed edges must choose a source vertex and a tar¬ 
get vertex independently at random, such that both the 
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Edge weights: W f> W 
Node degrees: kj ]> fcg 


(a) 




(c) 


Figure 1: (a) Qualitative schematic of the partial order defined by the MLF filter: three pairs of nodes connected by edges of 
varying weights. The size of a node represents its (weighted) degree and the thickness of the edge represents its integer 
weight. The top case has a higher significance than either of the bottom cases: with the node degrees fixed, a higher edge 
weight W results in higher significance. With the weight W fixed, lowering either end-node’s degree results in higher 
significance, (b) Four graph measures computed for the US air traffic network (2012) filtered at different levels using the 
Marginal Likelihood Filter (solid) and weight thresholding (dashed). The x axis is the proportion of edges retained by the 
filtering. Clockwise from top left: 1- Proportion of nodes in the giant component. 2- Clustering coefficient for the giant 
component. 3- Diameter of the graph. 4- Clique number of the graph, (c) The Jaccard similarity between the set of “on 

edges” produced by the MLF and GLF filters. 


in-degree distribution and the out-degree distribution re- an alternative expression for the connectivity probability 
fleet the realized values on average. Thus, the source and p^ by seeking solutions of the form 
target vertices must be chosen with probability propor¬ 
tional to the nodes’ out and in-degrees respectively. The 
weights will be distributed binomially: 


Pij = 


f(ki)f(kj) 


i = 3 


( 9 ) 


Pr [ a ij = m | k° ut , kj 1 , T\ 


where 


l,out h,in 

J 1 2 5 


( m ) P « (1 - p « )T '”‘ (7) 

T = Y. k i “* M 


such that the degree sequence is preserved on average: 


h = Tj2Pij = Tf{h) 




( 10 ) 


The p -value is defined just as in ©> replacing ki with 
k° ut and k 3 with 

A. Excluding self-edges 

The connectivity probability © used in our null model 
corresponds to the configuration model (more specifi¬ 
cally, its sparse limit m) in which self-edges are not 
precluded. In most applications, whether the observed 
network contains self-edges or not, this poses no problem 
in principle since randomization necessarily involves for¬ 
getting certain properties of the observed instance. Fur¬ 
thermore, the sparse limit of the configuration model is a 
good approximation of the special loopless case when no 
vertex is dominant in terms of weighted degree. However, 
if the preclusion of loops is fundamental to the underly¬ 
ing dynamics or structure of the graph, one can easily 
modify the null model to account for this fact. We find 


The difference here is that i is excluded from the sum. 
Note that this equation automatically satisfies the loop¬ 
less normalization condition Y^i<jPij = 1 as well. To 
first order in f(i)/J2 f(i), the solution is f(i) = ki/T\/2 
and we recover © . One can continue solving for higher 
order terms to compute the loopless solution to arbitrary 
precision. Here we demonstrate the solution of the sec¬ 
ond order term. Writing /(&*) = fi + <5^ + O ^(fcj/T) 3 ^ 

where fi = ki/T\J 2 is the first order solution, and solving 
for Si to second order, we find 

Si = ^ 1 where c = ^ Sj. (11) 

We need only solve for c by summing over all Si, which 
yields 

_ 1 _ Yf 

(l + \/2) Y 


c = 


( 12 ) 
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We will continue to refer to the simplified model defined 
by (2t as the MLF, and to the modified versions defined 
by©, © and (|12|) as the second order loopless MLF. 


III. GLOBAL LIKELIHOOD FILTER 

The Marginal Likelihood Filter described above assigns 
a p -value independently to each edge allowing us to define 
the pruned graph simply as the subgraph consisting of 
the top most significant edges. This results in a fast 
algorithm running in 0(\E\ log \E\) time where \E\ is the 
size of the edge set of the graph. However, this comes at 
a cost: it is assumed that the statistical significance of 
an edge is independent of the presence of other edges. In 
this section we develop an ensemble approach to pruning 
that avoids this assumption. In order to motivate this 
approach, we begin by reviewing the exponential random 
graph model. 

The so-called exponential random graph model 
(ERGM) has been used for decades—especially in the 
social sciences—to model unweighted empirical networks 
and study the relationship between different graph prop¬ 
erties where simpler regression models fail due to the cor¬ 
related nature of the data. The model consists of a prob¬ 
ability measure on the set of all possible graphs with n 
vertices given by 


the distinguishability of the events as pointed out by M- 
Unlike physical particles, macroscopic events represented 
by multi-edge graphs are distinguishable, and thus, the 
statistics resemble a Boltzmann gas rather than a Bose- 
Einstein gas m In concrete terms, this manifests as 
a multiplicity number associated to each weighted graph 
configuration that must be taken into account in the par¬ 
tition function. 

Let us now define the GLF filtering procedure. Sup¬ 
pose we have an observed graph Go with n vertices and 
edge weights uiq £ {0,1, 2, • ■ • } for i, j = 1, 2, • • • , n so 
that the graph possesses m weighted edges (edges with 
positive weight). As the null model, we consider the max¬ 
imum entropy ensemble of (integer) weighted graphs 
with n vertices such that the degree sequence is equal to 
that of the observed graph on average. This automati¬ 
cally ensures that the total edge strength of the graph 
is also equal to that of the observed graph on average. 
Thus, following [T2] and pT], we obtain the grand canon¬ 
ical ensemble defined by the Boltzmann distribution 


p {G) = exp 


y^(#i + OjWij 

i<j 


VGeSf 


(14) 

where er.y £ {0,1, 2, • • • } is the weight of the ( i,j ) edge 
in G, 9i is the inverse temperature determining (fcj), and 


P(G) ~ g01^l(C?)+02^2 (G)H- 0mXm{G) (13) 

where Xi(G ) is some graph property such as the total 
number of edges or the degree of a specific node, etc., 
and 9i is a parameter that must be adjusted such that 
(. Xi(G )), the ensemble average of property a,y, matches 
the observed value. It was shown m that this probabil¬ 
ity measure can be derived as the Boltzmann (or Gibbs) 
distribution for a canonical ensemble of graphs. In other 
words, this is the probability measure that maximizes 
the entropy while keeping each graph property Xi equal 
to a fixed value on average, just as the familiar Boltz¬ 
mann distribution of the canonical ensemble in statisti¬ 
cal mechanics yields a maximum entropy ensemble with 
a given average energy. The parameter 9i then acts as an 
inverse temperature whose value adjusts ( Xi(G )). Given 
one such model, one can compute other graph properties 
of interest as derivatives of the partition function. 

More recently, the ERGM was generalized to weighted 
or multi-edge graphs m nu where a weighted 
edge/multi-edge represents the number of “events” asso¬ 
ciated with a pair of nodes. Examples include transporta¬ 
tion traffic volume networks where the weight of the edge 
between two cities counts the number of “travel events” 
between the two cities over a given time period. Here, a 
pair of nodes can be viewed as a physical state that can 
be occupied by an arbitrary number of particles, and the 
weight of their incident edge corresponds to the occupa¬ 
tion number of the state. The crucial difference here is 




(pi<j °b) ! 

n i<j a ij ■ 


(15) 


is the multiplicity of the configuration {<7jj} resulting 
from the distinguishability of the events. Note that using 
the partition function Z , we can compute the parameters 
9i such that the ensemble’s expected value of the total 
weight and the degree sequence match those of the ob¬ 
served graph, namely T = \ Yhij w ij an d {&*}• In the 
thermodynamic limit T 1, up to an additive constant, 
the log-likelihood is given by 


log P(G) = log (N\) + ^ Wij log Pij - log (cr.y!)] (16) 

i<j 


where 


kikj 
Pij 2T 2 ’ 


N = Y J o ij . 

i<j 


(17) 


To evaluate the large factorials involved, one can use the 
highly accurate Stirling approximation. For the details 
of the computations leading to © l7| , see Appendix A. 

We posit that the most significant subgraph with 
m! < m edges is the minimum likelihood subgraph among 
the set of all subgraphs of Go possessing m! non-zero 
weights, according to the null distribution (14). This is 
the subgraph least likely to have been generated purely 
by chance given the null distribution. Note that unlike 
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Figure 2: Visualizations of the US airport transportation network (2012) pruned using (a) weight thresholding, and (b) the 
Marginal Likelihood Filter. In each case, the top 15% of the edges with the respective edge attribute are retained. Both plots 
are rendered using the same standard Fruchterman-Reingold layout algorithm with identical parameters. 
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Figure 3: Visualizations of the network of interdependent occupations in the state of New York, pruned using (a) weight 
thresholding, and (b) the Marginal Likelihood Filter. Both are truncated at the 3% level and plotted using a standard 
Fruchterman-Reingold layout algorithm with identical parameters. 
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the Marginal Likelihood Filter discussed in the previ¬ 
ous section, here we cannot assign independent scores to 
edges and select the top to'. The term log (TV!) combines 
the effect of all existing edges in a realization inseparably. 
Therefore, solving this minimum likelihood problem re¬ 
quires a standard Monte Carlo simulation. For instance, 
a Metropolis algorithm can be used where initially a ran¬ 
dom set of ml < m edges from Go are “turned on”, and 
then at each step, one of the “on” edges is chosen at ran¬ 
dom and replaced with a randomly selected “off” edge. 
The change is accepted if it decreases log P(G) and re¬ 
jected with probability 1 — exp(—A log P{G)) otherwise, 
all the while keeping track of the minimum likelihood set 
found thus far. The change in log-likelihood is easy to 
compute since the second term in (16) is a sum over all 
“on edges”, and A log (IV!) can also be computed easily 
using the Stirling approximation. 


IV. APPLICATION TO REAL WORLD 
NETWORKS 

In this section, we apply weight thresholding and the 
Marginal Likelihood Filter to two networks. The first is 
the network of US air traffic in 2012 (data courtesy of 
Alessandro Vespignani, following the work in [T6].) In 
this network each node is a US city and an edge weight 
represents the air traffic volume between airport (s) in one 
city and another, aggregated over the year 2012. The 
network is symmetrized and undirected. 

Fig. [ljb) summarizes four graph measures computed 
for this network truncated at different levels, both using 
the MLF and using weight thresholding. The GLF fil¬ 
ter yields results similar to the MLF. The x axis is the 
percentage of the total edges retained in the truncated 
version. The four measures are the following: 1. the 
size (number of nodes) remaining in the giant compo¬ 
nent (\V f \/\V\) 2. the averaged local clustering for the 
giant component Cf [Sj. 3. the diameter of the graph 
Df 4. the clique number of the graph w(G) which mea¬ 
sures the size of the largest complete subgraph, or clique, 
found within the graph OH- We observe that at the 
same level of truncation, the significance filter leads to a 
much larger giant component. Roughly at the 50% level, 
almost all nodes are already in the giant component. 
When pruning highly connected graphs, retaining a large 
giant component is naturally desirable since decompos¬ 
ing the graph into a large number of small components 
results in the loss of all connectivity information (e.g. 
network distance) between most pairs of nodes. Even if 
one hopes to accentuate the community structure of the 
graph via pruning, it is still preferable for the communi¬ 
ties to remain connected to the giant component via weak 
links than to be completely cut off as distinct connected 
components. The clustering coefficient for the weight- 
threshold truncations remains roughly the same for all 


thresholds, whereas the significance filter produces con¬ 
siderably lower clusterings at severe truncations, suggest¬ 
ing that the truncated graph is rather sparse. The diam¬ 
eter (longest shortest path) of the truncated graphs are 
also significantly different between the two filters, with 
the significance filter yielding rather large diameters at 
severe truncations, suggesting a sharper departure from 
a fully connected graph. Finally, we observe a signifi¬ 
cant difference between the clique numbers of the graphs 
according to the two filters. For the weight filter, ui in¬ 
creases steadily as more and more edges are included, 
whereas for the significance filter, it remains at a more 
or less constant and low value until about the 90% thresh¬ 
old at which point a sharp increase brings it to the level 
of the untruncated network. This reinforces the finding 
on the clustering number suggesting that the significance 
threshold produces graphs with lower local densities. 

Fig. [2] compares the US airport transportation net¬ 
work truncated using the MLF and weight threshold¬ 
ing. (Again, the GLF produces results very similar to 
the MLF.) In both cases 15% of the edges are retained 
and the plots are rendered using a generic force-directed 
layout algorithm (Fruchterman-Reingold) with identical 
parameters. While the weight thresholded graph still 
appears as a “hairball” graph, the significance-filtered 
graph naturally unfolds into what resembles the actual 
geographical distribution of the nodes almost perfectly. 
This particular effect is in part due to the removal of 
long-range high-volume edges that are nevertheless as¬ 
signed a low significance due to the high strength of their 
incident vertices. For instance, the edges (Los Angeles, 
New York City) and (Chicago, San Francisco) are absent 
from this truncation despite their large weight. Our filter 
is thus prioritizing local connections over long-range con¬ 
nections indicating the higher importance of these links 
with respect to the overall traffic volume of their two 
end points. This is of course specific to this network, 
but demonstrates that some underlying property of the 
network is revealed by pruning. Finally, Fig. |T](c) com¬ 
pares the edge sets selected by the MLF and GLF filters 
{Emlf, Eglf) at different truncation levels. The y axis 
is the Jaccard similarity between the two edge sets de¬ 
fined as 


J(A,B) 


\A n b\ 
\AUB\- 


(18) 


We observe that the two filters show a high level of 
similarity—over 80% for truncation thresholds as low as 
20%. The disparity is of course to be expected, since 
in the MLF scheme, the edge set at a lower threshold is 
necessarily a subset of the edge set at a higher threshold 
whereas in the GLF scheme, it is possible that an edge 
which was absent at a higher threshold will be present at 
a lower threshold (more severe truncation). 

We also applied the second order loopless MLF to the 
air traffic network, and found that the pruned graph 
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is virtually indistinguishable from that obtained from 
the first order (standard) MLF. To be specific, the two 
pruned graphs differed on a handful of edges. Most no¬ 
tably, the Chicago-Minneapolis and Columbus-Atlanta 
edges were absent from the result of the loopless MLF, 
but barely changed the graph layout, suggesting that the 
standard MLF can serve as a good approximation to the 
loopless case as well. 

Next we applied the MLF to the network of interdepen¬ 
dent occupations in the state of New York derived from 
the database of campaign contributions published by the 
Federal Election Commission (FEC). This database con¬ 
tains every monetary contribution over $200 by an in¬ 
dividual to a federal US election campaign since 1979. 
Among other information, each record contains the oc¬ 
cupation and employer of the donor at the time of the 
transaction. The author has compiled a “disambiguated” 
version of this database (to be published separately in the 
near future), meaning one in which all transactions from 
the same individual are linked. This database consists of 
roughly 24,000,000 records from around 6,000,000 indi¬ 
vidual. Using this disambiguated database one can pro¬ 
duce a co-occurrence network of occupations where each 
node is a distinct occupation label and two labels are 
linked if both appear in an individual’s history. Thus, an 
edge can indicate either semantic equivalence (“Doctor” 
and “Physician”) or a plausible transition (“Postdoc” to 
“Professor”). The weight of an edge counts the num¬ 
ber of times the two end-nodes co-occurred in histories 
of individuals. For instance, “Lawyer” and “Attorney” 
are linked with a rather large weight. Similarly, “Presi¬ 
dent” and “CEO” are strongly linked. Due to the large 
size of the database however, many apparently unrelated 
occupations are also linked, albeit with lower weights, 
e.g., a “Doctor” who at some other time identifies as a 
“Writer”. We can now ask if we can uncover the struc¬ 
ture of interrelated occupations by pruning the dense co¬ 
occurrence network. Figure [3] shows this network pruned 
using weight thresholding and the MLF. In both cases, 
the top 3% of the edges are retained. As with the air 
transportation network, weight thresholding does little 
to reveal the important structures within the network 
while the MLF untangles the network into clearly visible 
clusters: legal professions are connected through “Part¬ 
ner” to the large cluster of top management occupations, 
“Homemaker” is in a distinct cluster together with cre¬ 
ative and philanthropic occupations, and medical and 
academic occupations are each in their own clusters. 


V. DISCUSSION 

In order to extract the most significant substructures 
in a complex network, we have proposed a generative 
null model, and studied two edge filters resulting from 
this model. Our significance measures are derived from a 


null model that preserves the total edge strength and the 
weighted degree sequence of the graph on average. Sim¬ 
ply put, this null model states that if everything were 
random, two arbitrary vertices would be connected with 
probability proportional to both their weighted degrees 
(strengths). In the first filter, the degree of deviation 
from this null model in the observed network at the edge 
level, expressed as a p-value, defines the marginal sig¬ 
nificance of an edge. When applied to real-world net¬ 
works, this filter extracts subgraphs that are significantly 
sparser (as measured by clustering, clique number and 
shortest path length) than one would obtain from sim¬ 
ple weight thresholding at the same level, even though it 
yields higher global connectivity as reflected by the size 
of the giant component. In the second filter, the likeli¬ 
hood of the occurrence of each subgraph as a whole deter¬ 
mines its global significance, and the most significant sub¬ 
graph corresponds to the minimum likelihood subgraph. 
Visual inspection of the US airport transportation net¬ 
work filtered using our significance measures reveals how 
low-weight regional links are prioritized over high-weight 
long-range links such that the original “hairball” network 
unfolds into a rather flat graph closely reflecting the ac¬ 
tual geographical distribution of the nodes. 

Regarding the relationship between the two filters, one 
more point merits discussion. First, equations (10) and 
(A9) are equivalent. Furthermore, the Gibbs distribution 
found in (All) is simply a multinomial distribution when 
the total edge weight is constrained. It is a standard 
result that when the joint distribution of multiple vari¬ 
ables is multinomial, the marginal distribution of each 
variable is simply binomial. We see then, that the edge 
weight distribution used in MLF is simply the marginal 
distribution of the Gibbs distribution used in GLF. 

Let us also emphasize the fact that unlike in sparsifica- 
tion, the goal in pruning is to reveal unknown structures 
which are obscured by noise. This is why the problem of 
pruning is not an approximation problem as there are no 
objective measures of success. Therefore, the merits of a 
pruning filter such as ours can only be judged by the as¬ 
sumptions defining the null model. A central concern 
is the balance between over and under-determination. 
The null model should sufficiently reflect the essential 
features of the observed graph through its defining con¬ 
straints. However, one must avoid imposing too many 
constraints or else the null model will be incapable of 
accounting for the natural variations in the (unknown) 
ensemble from which the observed graph is obtained. In 
this context, our filter can be viewed as a middle ground 
between the disparity filter m which defines the null 
model only based on a node’s incident edges, and the 
GloSS filter of m which preserves the global distribu¬ 
tion of edge weights. 

Finally, we outline a number of potential future direc¬ 
tions. For highly skewed degree distributions, the asymp¬ 
totic expansions of different equations in (10) may need 






to be truncated at different orders depending on ki/T in 
order to produce balanced solutions. Another remaining 
question is whether one can define an optimal significance 
threshold for pruning a given graph. Presumably, some 
combination of the graph measures discussed in fig. 1 1 b | as 
well as other relevant measures can aid the practitioner 
in determining the appropriate truncation level. How¬ 
ever, whether or not a generically applicable recipe can 
be defined remains to be seen. 


ACKNOWLEDGMENTS 

The author wishes to thank Elaine Stranahan, Nima 
Dehmamy and Oleguer Sagarra for fruitful discussions. 
This material is based upon work supported in part 
by the National Science Foundation under grant No. 
502019. 


Appendix A: 

In this appendix, we will compute the partition func¬ 
tion for the ensemble defined by (141 and enforce the 
constraints on the degree sequence by solving for the pa¬ 
rameters 8i, i = 1,2, • • • , n. The partition function is de¬ 
fined by 


^ = E 9 [{°ij}] ex P 

Wij} 


— + 0j )&i_ 

i<j 

1 


(Al) 


= 53 53 N 'Ii — je~ (0i+e ^. (A2) 

N=0 {a i:j } i<j aij ' 

E 

Using the Multinomial theorem, the inner sum simplifies: 
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E 

E^E 

(A3) 

N= 0 
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The expected values of the degrees can be computed as 
follows: 


E< 


= -w,' agZ 


(A5) 


Setting these equal to the ki respectively and defining 
Xi = e~ 6i i = 1,2, • • • , n (A6) 

after rearranging the terms we obtain 


ki _ >~5 

1 ' x i x 3 

53 k * = 2T 


(A7) 

(AS) 


Summing the first equation over i and using the second 
equation yields x i x j = ^/(l + T). Thus we obtain 

a system of n nonlinear equations 




1 + T 


= Xi [ 53 x o ~ x i ) * = 1,2, (A9) 


Note that this is identical to equation (10) arising in the 
loopless MLF. Using the fact that JU ki = 2 T 1, to 


first order in terms of x 




the solution is 


e 6i = Xi ~ 


T\/2 


(A10) 


Therefore, the probability distribution (141 becomes 
P(G) = ±g[{* ij }] I] 

i<3 


(bkY 

l 2T 2 ) 


VG e <S (All) 


and we obtain (16). 
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